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ABSTRACT Bats are reservoirs for emerging zoonotic viruses that can have a profound impact on human and animal health, in- 
cluding lyssaviruses, filoviruses, paramyxoviruses, and severe acute respiratory syndrome coronaviruses (SARS-CoVs). In the 
course of a project focused on pathogen discovery in contexts where human-bat contact might facilitate more efficient interspe- 
cies transmission of viruses, we surveyed gastrointestinal tissue obtained from bats collected in caves in Nigeria that are fre- 
quented by humans. Coronavirus consensus PCR and unbiased high-throughput pyrosequencing revealed the presence of coro- 
navirus sequences related to those of SARS-CoV in a Commerson’s leaf-nosed bat (Hipposideros commersoni). Additional 
genomic sequencing indicated that this virus, unlike subgroup 2b CoVs, which includes SARS-CoV, is unique, comprising three 
overlapping open reading frames between the M and N genes and two conserved stem-loop II motifs. Phylogenetic analyses in 
conjunction with these features suggest that this virus represents a new subgroup within group 2 CoVs. 


IMPORTANCE Bats (order Chiroptera, suborders Megachiroptera and Microchiroptera) are reservoirs for a wide range of viruses 
that cause diseases in humans and livestock, including the severe acute respiratory syndrome coronavirus (SARS-CoV), respon- 
sible for the global SARS outbreak in 2003. The diversity of viruses harbored by bats is only just beginning to be understood be- 
cause of expanded wildlife surveillance and the development and application of new tools for pathogen discovery. This paper 
describes a new coronavirus, one with a distinctive genomic organization that may provide insights into coronavirus evolution 


and biology. 
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“ oronaviruses (order Nidovirales, family Coronaviridae, subfam- 
“es ily Coronavirinae) infect a wide range of vertebrates and cause 
respiratory, enteric, or less frequently, neurological diseases (1, 2). 
Coronaviruses were originally divided into three groups based on 
their antigenic cross-reactivities and nucleotide sequences (3). They 
have been recently reclassified by the International Committee on 
Taxonomy of Viruses into 3 genera, designated Alphacoronavirus 
(former group 1), Betacoronavirus (former group 2), and Gamma- 
coronavirus (former group 3) (4). Whereas the alphacoronaviruses 
and betacoronaviruses are associated with diseases of mammals, in- 
cluding humans, the gammacoronaviruses are implicated chiefly in 
diseases of birds. Interest in coronaviruses was largely focused on 
their impact on domestic porcine and avian husbandry and their 
utility in animal models of virus-induced demyelination (5) until the 
emergence of severe acute respiratory syndrome (SARS) in 2003 (6). 
Thereafter, with recognition of the causative agent SARS corona- 
virus (SARS-CoV) (7-10) and of the presence of SARS-CoV-like 
viruses in Chinese horseshoe bats (Rhinolophus spp.) (11), efforts 
to explore the genetic diversity of coronaviruses and their host 
range intensified (12). 
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Bats are suggested to be important reservoir hosts of many 
zoonotic viruses with significant impact on human and animal 
health, including lyssaviruses, henipaviruses, filoviruses, and 
coronaviruses (13-17). Viruses of bats may be transmitted to hu- 
mans directly through bites or via exposure to saliva, fecal aero- 
sols, or infected tissues as well as indirectly through contact with 
infected intermediate hosts, such as swine (18). In the course of a 
project focused on pathogen discovery in situations where 
human-bat contact might facilitate more efficient interspecies 
transmission of emerging viruses, we surveyed bats in Nigeria. 
Through consensus PCR (cPCR) and unbiased high-throughput 
pyrosequencing (UHTS) of bat tissue samples, we identified a 
coronavirus that is most closely related to the genus Betacorona- 
virus (subgroup 2b), which includes SARS-CoV and SARS-CoV- 
like viruses. However, the genomic organization of this corona- 
virus, obtained from a Commerson’s leaf-nosed bat (Hipposideros 
commersoni), is unique in that it is comprised of three overlapping 
open reading frames (ORFs) between the M and N genes and two 
conserved stem-loop II motifs (s2m). Based on these observations 
and phylogenetic analyses, we propose that this new member of 
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\ Bat collection sites 


FIG 1 (A) Map of Nigeria showing the locations of bat collection sites. (B) 
Photograph of a male Commerson’s leaf-nosed bat (Hipposideros commer- 
soni), courtesy of Ivan V. Kuzmin, reproduced with permission. 


the family Coronaviridae, tentatively named Zaria bat coronavirus 
(ZBCoV) after the city near to where the bat was captured, repre- 
sents a new subgroup of group 2 CoVs. 


RESULTS 

Identification of a coronavirus in intestinal tissue of a Commer- 
son’s leaf-nosed bat (Hipposideros commersoni). Total RNA ex- 
tracts from gastrointestinal tract (GIT) specimens obtained from 
33 bats of 6 different species (Eidolon helvum, Hipposideros com- 


A 


mersoni, Pipistrellus sp., Rousettus aegyptiacus, Scotophilus nigrita, 
and Scotophilus leucogaster) captured at 2 different sites from a 
roost inside a cave in Nigeria (Fig. 1A) were screened for the pres- 
ence of coronaviruses by consensus PCRs of a 400-nucleotide (nt) 
fragment of the RNA-dependent RNA polymerase (RdRp) gene. 
One specimen obtained from a Commerson’s leaf-nosed bat 
(Fig. 1B) yielded products that shared no more than 70% nt iden- 
tity to any known coronavirus. RNA from ZBCoV was submitted 
for UHTS, resulting in a library comprising 74,133 sequence 
reads. Alignment of unique singleton and assembled contiguous 
sequences to the GenBank database (http://www.ncbi.nlm.nih 
.gov/) using the Basic Local Alignment Search Tool (Blastn and 
Blastx) (19) indicated coverage of approximately 6,500 nt of se- 
quence distributed along coronavirus genome scaffolds and ho- 
mology to regions of replicase, spike (S), and nucleocapsid 
(N) sequences. 

Genome organization and coding potential of ZBCoV. The 
additional genomic sequence of ZBCoV was determined by filling 
in gaps between UTHS reads, applying consensus PCRs, and 3’ 
and 5’ rapid amplification of cDNA ends (RACE). Overlapping 
primer sets based on the draft genome were synthesized to facili- 
tate sequence validation by conventional dideoxy sequencing. 
Due to exhaustion of the sample, we were unable to completely 
sequence the open reading frame lab (ORF lab) region (Fig. 2A). 

ZBCoV has a genome organization similar to that of other 
coronaviruses, with the following characteristic gene order: 
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FIG 2 Genome organization of ZBCoV in comparison to that of representative coronaviruses from subgroup 2b. (A) Overall genome organization of ZBCoV. 
The ORF lab, spike (S), envelope (E), membrane (M), and nucleocapsid (N) genes are shown in gray arrows, whereas putative accessory genes ORF 3, ORF 6, 
ORE 7, and ORE 8 are indicated as 3, 6, 7, and 8 and illustrated by green arrows. The following conserved functional domains in ORF lab are represented in boxes: 
papain-like protease (PL), 3C-like protease (3CL), RNA-dependent RNA polymerase (RdRp), metal ion-binding domain (MB), and helicase (Hel). The two 
regions in ORF lab where sequences are incomplete are indicated by black lines. (B) Expanded diagram of the 3’ region of the ZBCoV genome in comparison to 
representative CoVs from subgroup 2b. TRS motifs and s2m are represented by black arrowheads and vertical lines, respectively. 
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TABLE 1 ORFs and putative TRS motifs 


Length in: 

ORF nt aa TRS 

lab NC? NC ACGAAC,,, AUG 
Spike 3,897 1,299 ACGAACAUG 

3 750 250 ACGAAC,,AUG 
Envelope 237 79 ACGAAC,,AUG 
Membrane 729 243 ACGAAC;,AUG 
6 147 49 NA® 

7 237 79 ACGAAC,AUG 
8 654 218 ACGAAC,,AUG 
Nucleocapsid 1,260 420 ACGAAC,,AUG 


4 NC, not complete. 
» NA, not applicable. 


5'-replicase ORF lab-spike (S)-envelope (E)-membrane (M)- 
nucleocapsid (N)-3’. Both the 5’ and 3’ ends contain short un- 
translated regions of 297 nt and 363 nt, respectively. The con- 
served putative transcription regulatory sequence (TRS) motif 
5'-ACGAAC-3’ identified in subgroup 2b, 2c, and 2d viruses (2) is 
present in ZBCoV at the 3’ end of the leader sequence and up- 
stream of potential initiating methionine residues of each ORF 
except ORF 6 (Table 1). 

All domains within replicase polyproteins of coronaviruses that 
are implicated in viral replication are found in ZBCoV, including the 
papain-like protease (PLP'°), 3C-like protease (3CLPr°), RNA- 
dependent RNA polymerase (RdRp), and helicase (Hel) domains 
(Fig. 2A). ORFs consistent with the S, E, M, and N proteins present 
in all other coronaviruses are also present in ZBCoV (Table 1; 
Fig. 2). Pairwise identity (I) and similarity (S) comparisons of a de- 
duced amino acid sequence of ZBCoV to that of representative coro- 
naviruses in other groups showed that the predicted proteins of 
ZBCoV are more similar to those of subgroup 2b CoVs than to those 
of other subgroups, with Hel and RdRp having the highest homolo- 
gies (Hel: I, 80%; S, 90%; RdRp: I, 74%; S, 85%) and the S protein 
having the lowest (I, 36 to 38%; S, 50 to 53%) (http://cait 
.cumc.columbia.edu:88/dept/greeneidlab/IdentificationofaSARS 
-Coronavirus-likevirusinaleaf-nosedbatinNigeria.html). 

The putative spike (S) protein of ZBCoV, comprising 1,299 
amino acids (aa) in length, is slightly larger than those of other sub- 
group 2b CoVs (see Table S8 in the supplemental material). 
ZBCoV showed the highest amino acid conservation to human and 
civet SARS-CoV (I, 38%; S, 53%) (http://cait.cumc.columbia 
.edu:88/dept/greeneidlab/IdentificationofaSARS-Coronavirus 
-likevirusinaleaf-nosedbatinNigeria.html). Pfam (20) analysis 
identified a spike receptor binding domain (PF09408) that corre- 
sponds to the immunogenic receptor binding domain that binds 
to angiotensin-converting enzyme 2 (ACE2) and the coronavirus 
S1 (PF01600) and S2 (PF01601) spike glycoprotein domains. 
Transmembrane region prediction (TMHMM 2.0) (21) revealed a 
long ectodomain (aa 1 to 1240), a transmembrane domain near 
the C-terminal end (aa 1241 to 1263), anda short cytoplasmic tail 
(aa 1264 to 1298). A predicted signal peptide (SignalP 3.0) (P = 1) 
(22) was identified with a cleavage site (P = 0.768) between resi- 
dues A;, and A,7. NetNGlyc 1.0 identified 25 putative N-linked 
glycosylation sites. The S protein of ZBCoV displays major se- 
quence differences compared to that of subgroup 2b CoVs, espe- 
cially in the $1 domain involved in receptor binding. The critical 
residues suggested to be important for the cleavage of the SARS- 
CoV S protein are present in the S protein of ZBCoV (23-25) (see 
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Fig. S1A in the supplemental material). Motifs at the carboxyl 
terminus of the S protein that are conserved among coronaviruses 
are also found in the ZBCoV S protein, including the conserved 
motif Y(X)KWPW(Y/W)(V/I)WL present as Y1537EKWPWYIWL 
and the cysteine-rich cytoplasmic tail (10) (see Fig. S1B in the 
supplemental material). 

In addition to the five genes present in all genomes, coronavi- 
ruses also have several group-specific genes between the S gene 
and the 3’ end of the genome that encode accessory proteins 
(Fig. 2) (26, 27). 

An ORF (ORF 3) encoding a putative 250-aa protein was ob- 
served between the S and E proteins of ZBCoV (Table 1). ORF 3 
corresponds to the genomic position of ORF 3a in subgroup 2b 
CoVs. Similar to subgroup 2b CoVs, ORF 3 is the largest accessory 
gene of ZBCoV and is 75 nt shorter than ORF 3a of subgroup 2b 
CoVs (see Table S8 in the supplemental material). ORF 3 shows 21 
to 23% aa identity and 31 to 35% aa similarity to the ORF 3a 
protein of subgroup 2b CoVs (see Table S9 in the supplemental 
material). Pfam analysis showed a relationship with PF11289, a 
viral family protein of an unknown function; TMHMM analysis 
predicts the presence of 4 transmembrane regions, spanning res- 
idues P,3 to L¢s, Azz to Egy; Voo to L131, and Yj 9¢ to V2. NetOGlyc 
3.1 predicted two potential O glycosylation sites in ZBCoV. ORF 3 
contains only a portion of the cysteine-rich domain identified in 
the ORF 3a protein of SARS-CoV; however, the cysteine poten- 
tially involved in ORF 3a protein polymerization (28) is present in 
ORF 3. No signal peptide, YXX®, or diacidic motifs were identi- 
fied in ORF 3 of ZBCoV (29). 

ZBCoV has a set of ORFs located between the M and N genes 
that are not shared by any of the known coronaviruses. These 
ORFs, ORF 6, ORF 7, and ORF 8, encode predicted proteins of 49, 
79, and 218 aa, respectively (Table 1). A TRS was identified up- 
stream of ORF 7 and ORF 8 but not ORF 6. ORF 6 overlaps with 
the M gene at the 3’ end by 101 nt, ORF 7 overlaps with ORF 6 by 
31 nt, and ORF 8 overlaps with ORF 7 and the N gene by 83 and 35 
nt, respectively. Blastx and Pfam analyses of ORF 6, ORF 7, and 
ORE 8 revealed no significant similarities or functional domains. 
Pfam analysis of ORF 7 indicated nonsignificant associations to 
the PRA] (prenylated Rab acceptor 1) proteins (PF03208) (E value = 
0.02) and the 7 transmembrane G-protein-coupled-receptor protein 
families (PF10323) (E value = 0.025). TMHMM analysis of ORF 7 
suggested the presence of a transmembrane region between residues 
L,, and I,,. No signal peptide was predicted. 

TMHMM and SignalP analyses of ORF 6 indicated no trans- 
membrane region or signal peptide. TMHMM analysis of ORF 8 
predicted 2 transmembrane regions, and a third transmembrane 
region located downstream was predicted by TMpred (30). Sig- 
nalP revealed a signal peptide (P = 0.988) with a putative cleaved 
signal sequence (P = 0.804) between residues Gy, and A3o. 

At only 788 nt, the region in ZBCoV between the M and N 
genes is significantly shorter than those observed for subgroup 2b 
CoVs (see Table S8 in the supplemental material). Alignment of 
the region between the M and N genes of ZBCoV with those of 
subgroup 2b CoVs indicated large deletions in ZBCoV (see Fig. S2 
in the supplemental material). 

Another distinctive genomic feature of ZBCoV is the presence 
downstream from the N gene of two conserved motifs corre- 
sponding to the conserved stem-loop II motif (s2m) (31). A 
unique s2m is observed in coronaviruses from subgroups 2b, 3a, 
and 3c and in astroviruses and in the picornavirus equine rhinitis 
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FIG 3 Phylogenetic analysis of the 3CLPt°, RdRp, Hel, S, M, and N proteins of ZBCoV. Unrooted maximum likelihood phylogenies of the 3CLPr° (A), 
RNA-dependent RNA polymerase (B), helicase (C), spike (D), membrane (E), and nucleocapsid (F) proteins. All phylogenies were constructed using the 
complete amino acid alignments of each protein, with the exception of RdRp (partial region available) and spike (only an 884-aa region could be reliably aligned). 
The scale bar indicates the number of substitutions per amino acid site. The numbers at each branch node represent the maximum likelihood bootstrap support; 
only major nodes where values exceed 70% are shown. The CoV subgroups are indicated as 1a and b, 2a to d, and 3a to c, and the following sequences obtained 
from GenBank were included, with the GenBank accession numbers given in parentheses: PRCV, porcine respiratory coronavirus (DQ811787); FIPV, feline 
infectious peritonitis virus (AY994055); HCoV-229E, human coronavirus 229E (NC_002645); HCoV-NL63, human coronavirus NL63 (NC_005831); BtCoV- 
512/2005, bat coronavirus 512/2005 (NC_009657); BtCoV-HKU2, bat coronavirus HKU2 (NC_009988); BtCoV-1B, bat coronavirus 1 B (NC_010436); BtCoV- 
1A, bat coronavirus 1A (NC_010437); BtCoV-HKU8, bat coronavirus HKU8 (NC_010438); BCoV, bovine coronavirus (NC_003045); HCoV-OC43, human 
coronavirus OC43 (NC_005147); HCoV-HKU1, human coronavirus HKU1 (NC_006577); MHV, mouse hepatitis virus (NC_006577); PHEV, porcine hem- 
agglutinating encephalomyelitis virus (NC_007732); ECoV, equine coronavirus (NC_010327); BtSARS-CoV HKU3, bat SARS coronavirus HKU3 
(NC_009694); CtSARS-CoV SZ3, civet SARS coronavirus $Z3 (AY304486); SARS-CoV, SARS coronavirus (NC_004718); BtSARS-CoV Rp3, bat coronavirus 
Rp3 (NC_009693); BtSARS-CoV Rf1/2004, bat coronavirus Rf1/2004 (NC_009695); BtSARS-CoV RM1, bat coronavirus RM1 (NC_009696); BtCoV-HKU4, bat 
coronavirus HKU4 (NC_009019); BtCoV HKU5, bat coronavirus HKU5 (NC_009020); BtCoV HKU%, bat coronavirus HKU9 (NC_009021); IBV, infectious 
bronchitis virus (NC_001451); TCoV, turkey coronavirus (NC_010800); SW1, beluga whale coronavirus (NC_010646); BuCoV HKU11, Bulbul coronavirus 
HKU11 (NC_011548); ThCoV HKU12, thrush coronavirus HKU12 (NC_011549); and MuCoV HKU13, Munia coronavirus HKU13 (NC_011550). 


B virus (ERBV) (31-33) (see Fig. S3A in the supplemental ma- 
terial). Alignment of the 3’ end of ZBCoV with subgroup 2b 
CoVs showed deletions in the genome of subgroup 2b CoVs 
where the second s2m of ZBCoV is identified (see Fig. S3B). 
The s2m of ZBCoV are almost identical in sequence and are 
separated by 19 nt (see Fig. S3B). mfold prediction (34) of RNA 
secondary structure indicated that both s2m fold into RNA 
stem-loop motifs (see Fig. S3C). 

Phylogenetic analyses. Phylogenetic trees constructed from 
3CLPre, RdRp, Hel, S, M and N amino acid sequences of ZBCoV 
and representative coronaviruses show that ZBCoV is most closely 
related to but distinct from the subgroup 2b CoVs, which in- 
clude SARS-CoV and SARS-CoV-like viruses (Fig. 3). This 
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finding is in accord with results obtained from pairwise amino 
acid comparisons of ZBCoV and other coronaviruses (http://cait 
.cumc.columbia.edu:88/dept/greeneidlab/IdentificationofaSARS 

-Coronavirus-likevirusinaleaf-nosedbatinNigeria.html). To fur- 
ther define the phylogenetic position of ZBCoV, an additional 
phylogeny was constructed using a conserved 659-nt sequence of 
RdRp, and the time to the most recent common ancestor (TMRCA) 
between ZBCoV and related coronaviruses was estimated. Based on 
the best-fit model (SRD06 with informative rate prior), the results of 
this analysis indicated that ZBCoV is most closely related to GhanaBt- 
CoV, a recently identified coronavirus found in bats in Ghana (35) 
(Fig. 4). Furthermore, ZBCoV and GhanaBt-CoV together form a 
well-supported clade distinct from that of the subgroup 2b CoVs. The 
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FIG 4 Estimation of the time of divergence between ZBCoV and representative coronaviruses. Bayesian MCMC phylogeny of a 659-nt region of the 
RNA-dependent RNA polymerase gene of ZBCoV and representative members of group 1, 2, and 3 coronaviruses. The host bat species and their geographic 
origins (*, Africa; **, Asia) are indicated for ZBCoV, GhanaBtCoV, and subgroup 2b CoVs. The times given at branch tips represent the dates of viral sampling, 
and the tree is rooted through the use of a relaxed molecular clock. Bayesian posterior probability values greater than 0.8 are shown above the branches leading 
to each major node. The mean TMRCAs for the taxa in subgroup 2b CoVs and ZBCoV are given below each branch, with the 95% highest probability densities 
indicated in parentheses. The following sequences from GenBank were included, with the GenBank accession numbers given in parentheses: for subgroup la 
CoVs, feline coronavirus (FJ938055) and canine coronavirus (GQ477367); for subgroup 1b CoVs, bat coronavirus HKU2 (DQ249213), bat coronavirus 
BtCoV/512/2005 (DQ648858), and human coronavirus NL63 (DQ445911); for subgroup 2a CoVs, murine hepatitis virus (AB551247), human coronavirus 
HKU1 (AY597011, DQ422731, DQ422728, DQ422732, DQ422737, and DQ422733), bovine respiratory coronavirus (AF220295, AF391541, AF391542, 
EF424615, EF424620, FJ938066, and U00735), equine coronavirus (EF446615), human enteric coronavirus 4408 (FJ415324), human coronavirus OC43 
(AY391777 and AY903460), and waterbuck coronavirus (FJ425184); for subgroup 2b CoVs, bat SARS coronavirus Rfl (DQ412042 and DQ648856), SARS 
coronavirus (AY313906, AY545914, AY559085, AY559097, AY595412, DQ071615, FJ882929, FJ882931, FJ882941, FJ882944, FJ882959, and FJ88686), bat SARS 
coronavirus HKU3 (DQ084199), and bat SARS coronavirus RM1 (DQ412043); for subgroup 2c CoVs, bat coronavirus HKU5 (DQ249217 and DQ249218), bat 
coronavirus HKU4 (DQ074652), and bat coronavirus BtCov/133/2005 (DQ648794); for subgroup 2d CoVs, bat coronavirus HKU9-1 (EF065513), bat corona- 
virus HKU9-2 (EF065514), bat coronavirus HKU9-3 (EF065515), and bat coronavirus HKU9-4 (EF065516); and for subgroup 3a CoVs, avian infectious 
bronchitis virus (AY514485, AY641576, AY646283, DQ001339, DQ646405, EU714029, FJ888351, FN430414, FN430415, HM245923, and HM245924) and 
turkey coronavirus (GQ427174, GQ427175, and GQ427176). 


TMRCA between ZBCoV and GhanaBt-CoV was estimated at 
1,417 years before present (ybp) (95% highest population density 
[HPD] = 267 to 3,061 ybp). The TMRCA between the ZBCoV/ 
GhanaBt-CoV clade and subgroup 2b CoVs was estimated at 3,047 
ybp (95% HPD = 714 to 6,205 ybp), whereas the TMRCA between 
SARS-CoVs and SARS-CoV-like viruses was only 515 ybp (95% 
HPD = 132 to 1,067 ybp). Estimates of the TMRCAs between sub- 
group 2b CoVs and the rest of the coronavirus groups are not pro- 
vided due to the potential for nucleotide site saturation at deeper 
phylogenetic levels to artificially create too recent TMRCA estimates. 

Whereas the mean pairwise nucleotide similarity of the partial 
RdRp gene region was 85% (standard deviation [SD] = 9.75) 
within coronavirus subgroups (excluding ZBCoV/GhanaBt- 
CoV), the mean pairwise similarity between coronavirus sub- 
groups was 66% (SD = 5.14) (see Fig. $4 in the supplemental 
material). Based on the results of the Mann-Whitney U test, these 
distributions are statistically different (P < 0.0001). Additionally, 
whereas the mean pairwise similarity within the clade ZBCoV/ 
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GhanaBt-CoV was 85% (SD = 9.01), the pairwise similarity be- 
tween the clade ZBCoV/GhanaBt-CoV and subgroup 2b CoVs 
was only 73% (SD = 0.84). Based on the results of the Mann- 
Whitney U test, these distributions are statistically different (P = 
0.0092). Together, these findings indicate that the clade contain- 
ing ZBCoV and GhanaBt-CoV should be considered a separate 
subgroup within group 2 CoVs, distinct from subgroup 2b CoVs 
(see Fig. S4 in the supplemental material). 


DISCUSSION 


Differences in phylogenetic relationships and genomic organiza- 
tion and the low amino acid similarities of ORF 3 and the S protein 
of ZBCoV compared to the ORF 3a and S proteins of subgroup 2b 
CoVs suggest that ZBCoV represents a new subgroup of corona- 
viruses within the group 2 CoVs. Although ZBCoV has features 
found in subgroup 2b CoVs, including the TRS, a unique PL?*?, 
ORFs between the M and N genes, and the presence of the s2m, 
ZBCoV forms a unique branch distinct from subgroup 2b CoVs in 
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all phylogenetic trees analyzed. Furthermore, it differs from sub- 
group 2b CoVs in that ZBCoV contains three (versus four to five) 
ORFs between the M and N genes and has two (versus one) s2m. 

Whereas the S proteins of subgroup 2b CoVs share 78 to 98% 
aa sequence identity, the S protein of ZBCoV has only 36 to 38% 
identity in the deduced amino acid sequence with those of sub- 
group 2b CoVs. Despite limited primary sequence conservation of 
the spike protein among ZBCoV and subgroup 2b CoVs, particu- 
larly in the $1 domain, Pfam analyses indicated the presence of a 
receptor domain that binds to the receptor ACE2, the cellular 
receptor for SARS-CoV (36). However, the residues in SARS-CoV 
that interact with the human ACE2 molecule are not conserved in 
ZBCoV, suggesting that human ACE2 is not a bona fide receptor 
for ZBCoV (37). 

ORF 3, located between the S and E proteins of ZBCoV, is 
slightly shorter than the 3a proteins of subgroup 2b CoVs and has 
at most only 22% aa identity to the 3a proteins of subgroup 2b 
CoVs. In contrast, the 3a proteins of subgroup 2b CoVs share 81 to 
98% aa identity. ORF 3 is predicted to contain four transmem- 
brane domains with extracellular N and C termini. In contrast, 
ORF 3a of SARS-CoV is predicted to contain three transmem- 
brane domains with extracellular N termini and intracellular C 
termini (28, 29). Whereas four O glycosylation sites are predicted 
in the ORF 3a protein of SARS-CoV (38), only two putative O 
glycosylation sites were identified in the ORF 3 of ZBCoV. The 3a 
protein of SARS-CoV has a cysteine-rich region important for 
polymerization and ion channel activity (28), as well as YXX® and 
diacidic motifs suggested to be involved in the intracellular traf- 
ficking (29). These domains were recently suggested to be impor- 
tant for the proapoptotic function of ORF 3a of SARS-CoV (39). 
However, ORF 3 of ZBCoV contains only a portion of the 
cysteine-rich domain and has no YXX® diacidic motifs. In con- 
trast to human and civet SARS-CoV and bat RF1/2004, there is no 
ORF 3b in ZBCoV. The 3b protein may function as an interferon 
antagonist (40). 

ZBCoV contains a unique set of ORFs located between the M 
and N genes. In subgroup 2b CoVs, ORF 6, ORF 7, and ORF 8 
between the M and N genes do not overlap. In contrast, the three 
ORFs between the M and N genes overlap in ZBCoV. Alignment 
with subgroup 2b CoVs indicated deletions in ZBCoV, and as a 
result, one continuous ORF, ORF 8, is present in ZBCoV in place 
of ORFs 7a, 7b, 8, 8a, and 8b of subgroup 2b CoVs. 

Similar to SARS-CoV, the putative products of ORF 6, ORF 
7, and ORF 8 of ZBCoV show no sequence homology to other 
viral proteins. No TRS upstream of ORF 6 is found, suggesting 
that if ORF 6 encodes a bona fide protein, that protein is likely 
expressed by the subgenomic RNA M. There is precedent in 
SARS-CoV for functional bicistronic RNAs in the expression of 
ORE 3b, ORF 7b, ORF 8b, and ORF 9b (26, 41). Coronaviruses 
possess accessory genes, the size and location of which are 
group specific (2). By analogy to SARS-CoV, ORF 6, ORF 7, 
and ORE 8 of ZBCoV may encode accessory proteins important 
for virus-host interactions that may contribute to virulence 
and pathogenesis (26). Recent studies suggest that the SARS- 
CoV accessory proteins 6 and 7b are incorporated into virus 
particles and that 3a, 7a, and 9b are structural components of 
the virion (26, 41, 42). The SARS-CoV accessory proteins are 
suggested to have biological functions that include virus re- 
lease, interferon antagonism, apoptosis induction, and inhibi- 
tion of cellular protein synthesis (26, 41). 
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Another unique feature of ZBCoV is the presence of two highly 
conserved RNA sequences (s2m) downstream of the N gene. A 
single s2m is identified at the 3’ end of the genomes of members of 
several RNA virus families, including the Coronaviridae and As- 
troviridae, as well as the picornavirus ERBV (31-33). Recent data 
suggest that the SARS-CoV s2m RNA is a functional molecular 
mimic of the 530 stem-loop region in small-subunit ribosomal 
RNA, which could facilitate viral hijacking of the host’s protein 
synthesis machinery (43). The presence ofa second s2m in ZBCoV 
may further increase the efficiency of this process. Interestingly, 
secondary structures downstream of the N gene, including bulged 
stem-loop and pseudoknot structures, are also identified in the 
genomes of subgroup 2a and 2c CoVs (44, 45). 

Lagos bat virus (family Rhabdoviridae, genus Lyssavirus) 
was initially identified in Nigeria in the 1950s. The discovery of 
ZBCovV in a bat of the genus Hipposideros (family Hipposide- 
ridae), is the first identification of a coronavirus in wildlife 
from Nigeria. Recently, bat coronaviruses closely related to 
ZBCOoV were isolated from roundleaf bats (Hipposideros caffer 
and Hipposideros ruber) in Ghana, a country that is close to 
Nigeria (35). Phylogenetic analysis indicates that ZBCoV and 
GhanaBt-CoV form a unique clade that is distinct from those in 
subgroup 2b CoVs. However, as the only sequence available for 
GhanaBt-CoV is a fragment of the RdRp gene, a comparison of 
the genome organization between ZBCoV and GhanaBt-CoV is 
not possible. Our findings and recent published data, wherein a 
SARS-CoV-like virus was found to lack ORF 8, suggest that 
there is considerable diversity in the genome organization of 
SARS-CoV-like viruses (46). 

SARS-CoV-like viruses have been isolated from various rhi- 
nolophid bats (family Rhinolophidae, genus Rhinolophus), com- 
mon insectivorous bats found in Africa and Eurasia. However, 
despite extensive studies, no SARS-CoV-like viruses have been 
reported in Hipposideros sp. bats in China (32). The Rhinolophus 
species suggested as reservoirs of SARS-CoV-like viruses are not 
present in Africa. A sequence fragment of a SARS-CoV-like virus 
was identified in Kenya in bats of the Chaerephon genus (family 
Molossidae) (47), and antibodies reactive with SARS-CoV antigen 
have also been detected in the sera of seven different genera of 
insectivorous and fruit bats sampled in central and southern 
Africa (48). In concert, these findings suggest that there may be no 
strict species-specific host restriction of SARS-CoV-like viruses in 
African bats. 

Our phylogenetic analysis indicates that the clade containing 
ZBCoV and GhanaBt-CoV occupies an ancestral position to the 
group 2b CoVs, which include SARS-CoV and SARS-CoV-like 
viruses. Similar to previous estimates, the TMRCA of these two 
clades was estimated at ~3,047 ybp (although with large 95% 
HPDs). Although SARS-CoV-like viruses have been identified ex- 
clusively in bats in China, a recent sequence fragment (~120 bp) 
recovered from a Kenyan bat was found to occupy a position just 
outside subgroup 2b and may represent the ancestral African lin- 
eage of all subgroup 2b CoVs (47). Together with the position of 
the African clade of ZBCoV/GhanaBt-CoV relative to subgroup 
2b CoVs, this finding suggests that a migration event from Africa 
to China within the last 100 to 1,000 years may have resulted in the 
subgroup 2b lineage of CoVs. Indeed, the geographic distribution 
and the phylogenetic relationships of bat coronaviruses seen both 
here (Fig. 4) and in previous work (35) suggest the presence of 
multiple independent migration events between Africa and Asia 
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throughout the history of bat coronaviruses. Additional sequence 
data for the bat coronaviruses identified in Kenya along with in- 
creased sampling for coronaviruses in Africa as well as central and 
eastern Asia will likely be necessary to unveil the timing and origin 
of this diverse group of coronaviruses. 

Bats are important reservoir hosts of zoonotic viruses with sig- 
nificant impact on human health, including rabies, Nipah virus, 
Hendra virus, Zaire Ebola virus, Marburg virus, and SARS-CoV. 
The wide genetic diversity that exists among zoonotic viruses in 
bats may allow an increased emergent potential of interspecies 
variants that may cause outbreaks of disease in humans and do- 
mestic animals. The giant leaf-nosed bat, Hipposideros commer- 
soni, is widespread in sub-Saharan Africa, from Gambia to Ethio- 
pia, Mozambique, and Madagascar, but little is known concerning 
its ecology, population biology, or vector competence. Clearly, in 
order to enhance our knowledge of the diversity and cooccurrence 
of potential reservoir hosts, it is essential to better understand 
emerging pathogen dynamics and public health relevance as a 
means to prevent and control future disease outbreaks. 


MATERIALS AND METHODS 


Bat sample collection. During June 2008, bats were collected with mist 
netting in caves and around human dwellings or manually from roost 
locations near Idanre and Zaria, Nigeria. All bats appeared clinically 
normal. Captured bats were anesthetized by intramuscular inocula- 
tion with ketamine hydrochloride (0.05 to 0.1 mg/g of body weight) 
and euthanized under sedation by intracardiac exsanguination and 
cervical dislocation. The species of each captured bat was recorded, as 
well as the sex, forearm and body lengths (in cm), and weight. All 
samples were initially stored, transported on ice packs, and stored 
thereafter at —20°C, until shipment on dry ice and final storage at 
—80°C. No lyssavirus-specific antigens were identified in bat brains by 
use of direct fluorescent antibody testing. 

Coronavirus consensus PCRs. Coronavirus screening was performed 
by nested PCR, amplifying a 400-nt fragment of the RdRp genes of 
coronaviruses using consensus primer sequences 5'-CGTTGGIACW 
AAYBTVCCWYTICARBTRGG-3’ and = 5’-GGTCATKATAGCRTCA 
VMASWWGCNACNACATG-3’ for the first PCR and consensus primer 
sequences —5’-GGCWCCWCCHGGNGARCAATT-3' and s5’- 
GGWAWCCCCAYTGYTGWAYRTC-3’ for the second PCR. Primers 
were designed by multiple alignments of the nucleotide sequences of 
available RdRp genes of known coronaviruses. Reverse transcription was 
performed using the SuperScript III kit (Invitrogen, San Diego, CA). PCR 
primers were applied at 0.2-M concentrations with 1 wl cDNA and Hot- 
Star polymerase (Qiagen, Valencia, CA). Cycle conditions used were as 
follows: 1 cycle at 95°C for 15 min; 15 cycles at 95°C for 30 s, 65°C for 30 s 
(—1°C/cycle), and 72°C for 45 s; 35 cycles at 94°C for 30 s, 50°C for 30 s, 
and 72°C for 45 s; and 1 cycle at 72°C for 5 min. 

UHTS. Total RNA obtained from the gastrointestinal tract specimen 
positive for coronavirus was extracted for UHTS. Purified RNA (0.5 pg) 
was DNase I digested (DNA-free; Ambion, Austin, TX) and reverse tran- 
scribed using a Superscript II kit (Invitrogen) with random octamer prim- 
ers linked to an arbitrary, defined 17-mer primer sequence (MWG, 
Huntsville, AL). CDNA was RNase H treated prior to random amplifica- 
tion by PCR, applying a 9:1 dilution mixture of a primer corresponding to 
the defined 17-mer sequence and the octamer-linked 17-mer sequence 
primer, respectively. Products of >70 bp were purified (MinElute; Qia- 
gen) and ligated to linkers for sequencing on a GS FLX sequencer (454 Life 
Sciences, Branford, CT). 

Genome sequencing. PCR primers for amplification across sequence 
gaps were designed (available upon request) based on the UTHS data, and 
the draft genome was sequenced by overlapping PCR products. Products 
were purified (QIAquick PCR purification kit; Qiagen) and directly 
dideoxy sequenced in both directions with ABI Prism BigDye Terminator 
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1.1 cycle sequencing kits (PerkinElmer Applied Biosystems, Foster City, 
CA). Additional methods applied to obtain the genome sequence in- 
cluded additional consensus PCR and 3’ and 5’ RACE (Invitrogen). 

Phylogenetic and sequence analyses. Alignments were constructed 
using MUSCLE 3.7 (49) and adjusted manually using Se-Al (50). Maxi- 
mum likelihood (ML) phylogenetic trees containing representative taxa 
from each coronavirus genus (n = 31) (Fig. 3, legend) were constructed 
using the subtree pruning and regrafting (SPR) method of branch swap- 
ping in PhyML (51). Phylogenies were constructed using amino acid 
alignments for the complete proteins of 3CL, Hel, M, and N and partial 
protein alignments for the available RdRp protein sequence and for the S 
protein after regions with low alignment confidence were removed. In all 
cases, the Whelan and Goldman model of amino acid replacement was 
used (52), with a gamma distribution of rate heterogeneity. The value of 
the shape parameter for gamma (a) was estimated from the data and 
approximated by six rate categories. The reliability of each branch in all 
phylogenies was estimated using a bootstrap resampling procedure, with 
100 ML replications. 

To estimate the time to the most recent common ancestor (TMRCA) 
for the taxa contained within subgroup 2b CoVs and including ZBCoV, an 
additional 659-nt alignment of the RdRp gene was constructed and cho- 
sen for homology to the gene region sequenced for the coronaviruses most 
closely related to ZBCoV (GhanaBt-CoV). All sequences for which time- 
of-sampling information was available were included (n = 64). TMRCAs 
were estimated using the Bayesian Markov chain Monte Carlo (MCMC) 
method with the BEAST package, version 1.5.2 (53), and both the general 
time-reversible (GTR) model plus I distribution and the SRD06 model of 
nucleotide substitution. A relaxed uncorrelated lognormal molecular 
clock was used, calibrated by the time-stamped sequences, both with and 
without informative rates prior on the molecular clock of 2.0 x 1074 + 
0.0009 nt substitutions/site/year (35). This analysis was run until all pa- 
rameters converged, with 10% of the MCMC chains discarded as burn-in. 
Statistical confidence in the TMRCA estimates is given by the 95% highest 
probability density (HPD) interval around the marginal posterior param- 
eter mean. 

The classification of ZBCoV and GhanaBt-CoV as a putative new sub- 
group within group 2 CoVs was determined by first calculating the per- 
cent pairwise nucleotide similarity of the same 659-nt region of RdRp 
genes between and within the existing subgroups of coronaviruses and 
then extending this comparison to include the clade ZBCoV/GhanaBt- 
CoV. To verify this approach, a nonparametric Mann-Whitney U test was 
used to assess if the pairwise nucleotide similarity within the currently 
accepted subgroups is different from that between subgroups. This test 
was then used to determine if the percent pairwise similarity within the 
clade ZBCoV/GhanaBt-CoV is statistically different from that of the most 
closely related subgroup 2b CoVs. 

Protein family analysis was performed using Pfam (http://pfam.sanger 
.ac.uk/). Predictions of signal peptide cleavage sites, glycosylation sites, 
and transmembrane domains were performed using respective prediction 
servers available at the Center for Biological Sequence Analysis (http: 
//www.cbs.dtu.dk/services/ and _ http://www.ch.embnet.org/software 
/TMPRED_form.html). The percent amino acid sequence identity and 
similarity were calculated using the Needleman algorithm with an EBLO- 
SUM602 substitution matrix (gap open/extension penalties of 10/0.1 for 
nucleotide and amino acid alignments; EMBOSS [54]), using a Perl script 
to iterate the process for all versus all comparisons. Prediction of RNA 
secondary structures was performed with the mfold program (http: 
//mfold.bioinfo.rpi.edu/). 

Nucleotide sequence accession number. The GenBank accession 
number for the ZBCoV sequence is HQ166910. 
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